AITopics | new term

NewTerm: Benchmarking Real-Time New Terms for Large Language Models with Annual Updates

Neural Information Processing SystemsMar-19-2026, 19:05:30 GMT

However, existing benchmarks focus on outdated content and limited fields, facing difficulties in real-time updating and leaving new terms unexplored. To address this problem, we propose an adaptive benchmark, NewTerm, for real-time evaluation of new terms. We design a highly automated construction method to ensure high-quality benchmark construction with minimal human effort, allowing flexible updates for real-time information. Empirical results on various LLMs demonstrate over 20% performance reduction caused by new terms. Additionally, while updates to the knowledge cutoff of LLMs can cover some of the new terms, they are unable to generalize to more distant new terms. We also analyze which types of terms are more challenging and why LLMs struggle with new terms, paving the way for future research. Finally, we construct NewTerm 2022 and 2023 to evaluate the new terms updated each year and will continue updating annually. The benchmark and codes can be found at https://anonymous.4open.science/r/NewTerms.

large language model, natural language, real time system, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Architecture > Real Time Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.92)

Add feedback

NewTerm: Benchmarking Real-Time New Terms for Large Language Models with Annual Updates

Neural Information Processing SystemsFeb-11-2026, 17:35:59 GMT

Besides, as the knowledge cutoff of LLMs is constantly updated, benchmarks for real-time information will soon become outdated.

large language model, machine learning, real time system, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Austria > Vienna (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(20 more...)

Genre: Research Report > New Finding (0.67)

Industry:

Education (0.46)
Information Technology > Services (0.45)
Law (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Architecture > Real Time Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

AT ask Level Case Study

Neural Information Processing SystemsFeb-11-2026, 17:35:56 GMT

This section illustrates how a model's performance may vary across different tasks associated with We analyzed the performance of Llama-3-Instruct-70B on the new term "wokely," The book's cover was described as wokely by several reviewers. A. it struggled to attract attention on the bookstore displays despite a B. many readers were enticed to buy it, strengthening its presence on C. readers were intrigued and the book's sales experienced an unexpected surge worldwide. D. the publisher decided to release a limited edition with a special In the previous sentence, does _ refer to A. Is this example in line with commonsense and grammatically correct? As observed, the model only answered correctly in the COMA task but failed in the other two tasks. In the COMA task, the model successfully inferred that "wokely" carries a negative connotation, Although the phrase "hard to find a satisfying These results provide a comprehensive evaluation of the model's understanding of the term "wokely."

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country:

Asia > China > Heilongjiang Province > Harbin (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (0.46)

Industry: Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

NewTerm: Benchmarking Real-Time New Terms for Large Language Models with Annual Updates

Neural Information Processing SystemsOct-10-2025, 00:13:50 GMT

Besides, as the knowledge cutoff of LLMs is constantly updated, benchmarks for real-time information will soon become outdated.

benchmark, llm, new term, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Austria > Vienna (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(20 more...)

Genre: Research Report > New Finding (0.67)

Industry:

Education (0.46)
Information Technology > Services (0.45)
Law (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Architecture > Real Time Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

3eec719ab86712d32b065c5977f94ad0-Supplemental-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsOct-10-2025, 00:13:49 GMT

benchmark, llm, new term, (15 more...)

Neural Information Processing Systems

Country:

Asia > China > Heilongjiang Province > Harbin (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (0.46)

Industry:

Retail (0.46)
Information Technology (0.46)
Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

WeTransfer says user content will not be used to train AI after backlash

The GuardianJul-16-2025, 15:05:19 GMT

The popular filesharing service WeTransfer has said user content will not be used to train artificial intelligence after a change in its service terms had triggered a public backlash. The company, which is regularly used by creative professionals to transfer their work online, had suggested in new terms that uploaded files could be used to "improve machine learning models". The clause had previously said the service had a right to "reproduce, modify, distribute and publicly display" content, and the updated version caused confusion among users. A WeTransfer spokesperson said user content had never been used, even internally, to test or develop AI models and that "no specific kind of AI" was being considered for use by the Dutch company. The firm said: "There's no change in how WeTransfer handles your content in practice."

artificial intelligence, machine learning, wetransfer, (6 more...)

The Guardian

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.63)

Add feedback

NewTerm: Benchmarking Real-Time New Terms for Large Language Models with Annual Updates

Neural Information Processing SystemsMay-26-2025, 22:19:50 GMT

However, existing benchmarks focus on outdated content and limited fields, facing difficulties in real-time updating and leaving new terms unexplored. To address this problem, we propose an adaptive benchmark, NewTerm, for real-time evaluation of new terms. We design a highly automated construction method to ensure high-quality benchmark construction with minimal human effort, allowing flexible updates for real-time information. Empirical results on various LLMs demonstrate over 20% performance reduction caused by new terms. Additionally, while updates to the knowledge cutoff of LLMs can cover some of the new terms, they are unable to generalize to more distant new terms.

large language model, natural language, real time system, (7 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Architecture > Real Time Systems (1.00)

Add feedback

NewTerm: Benchmarking Real-Time New Terms for Large Language Models with Annual Updates

Deng, Hexuan, Jiao, Wenxiang, Liu, Xuebo, Zhang, Min, Tu, Zhaopeng

arXiv.org Artificial IntelligenceOct-28-2024

However, existing benchmarks focus on outdated content and limited fields, facing difficulties in real-time updating and leaving new terms unexplored. To address this problem, we propose an adaptive benchmark, NewTerm, for real-time evaluation of new terms. We design a highly automated construction method to ensure high-quality benchmark construction with minimal human effort, allowing flexible updates for real-time information. Empirical results on various LLMs demonstrate over 20% performance reduction caused by new terms. Additionally, while updates to the knowledge cutoff of LLMs can cover some of the new terms, they are unable to generalize to more distant new terms. We also analyze which types of terms are more challenging and why LLMs struggle with new terms, paving the way for future research. Finally, we construct NewTerm 2022 and 2023 to evaluate the new terms updated each year and will continue updating annually.

large language model, machine learning, real time system, (19 more...)

arXiv.org Artificial Intelligence

2410.20814

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Austria > Vienna (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
(21 more...)

Genre: Research Report > New Finding (0.67)

Industry:

Education (0.46)
Retail (0.46)
Information Technology (0.46)
Law (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Architecture > Real Time Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Single Ground Truth Is Not Enough: Add Linguistic Variability to Aspect-based Sentiment Analysis Evaluation

Yang, Soyoung, Cho, Hojun, Lee, Jiyoung, Yoon, Sohee, Choi, Edward, Choo, Jaegul, Cho, Won Ik

arXiv.org Artificial IntelligenceOct-13-2024

Aspect-based sentiment analysis (ABSA) is the challenging task of extracting sentiment along with its corresponding aspects and opinions from human language. Due to the inherent variability of natural language, aspect and opinion terms can be expressed in various surface forms, making their accurate identification complex. Current evaluation methods for this task often restrict answers to a single ground truth, penalizing semantically equivalent predictions that differ in surface form. To address this limitation, we propose a novel, fully automated pipeline that augments existing test sets with alternative valid responses for aspect and opinion terms. This approach enables a fairer assessment of language models by accommodating linguistic diversity, resulting in higher human agreement than single-answer test sets (up to 10%p improvement in Kendall's Tau score). Our experimental results demonstrate that Large Language Models (LLMs) show substantial performance improvements over T5 models when evaluated using our augmented test set, suggesting that LLMs' capabilities in ABSA tasks may have been underestimated. This work contributes to a more comprehensive evaluation framework for ABSA, potentially leading to more accurate assessments of model performance in information extraction tasks, particularly those involving span extraction.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2410.09807

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Singapore (0.04)
North America > Canada > Ontario > Toronto (0.04)
(7 more...)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Dictionary.com's Largest Update (Re)defines Thousands Of Words, Focusing On Identity

NPR TechnologySep-4-2020, 03:32:57 GMT

Dictionary.com has updated thousands of entries to reflect the changing use of language in 2020, particularly in subjects like race, gender, health, technology and politics. Dictionary.com has updated thousands of entries to reflect the changing use of language in 2020, particularly in subjects like race, gender, health, technology and politics. Anyone grasping for the right word or phrase to describe life in 2020 now has a larger lexicon to work with. Dictionary.com has updated thousands of entries and added hundreds of words in its largest release to date, a reflection of the ways in which society and language have evolved even in just the past few months. The digital dictionary announced earlier this week that it updated more than 15,000 entries and added 650 brand new terms.

artificial intelligence, dictionary, natural language, (8 more...)

NPR Technology

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.53)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.32)

Add feedback

Filters

Collaborating Authors

new term

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

NewTerm: Benchmarking Real-Time New Terms for Large Language Models with Annual Updates

NewTerm: Benchmarking Real-Time New Terms for Large Language Models with Annual Updates

AT ask Level Case Study

NewTerm: Benchmarking Real-Time New Terms for Large Language Models with Annual Updates

3eec719ab86712d32b065c5977f94ad0-Supplemental-Datasets_and_Benchmarks_Track.pdf

WeTransfer says user content will not be used to train AI after backlash

NewTerm: Benchmarking Real-Time New Terms for Large Language Models with Annual Updates

NewTerm: Benchmarking Real-Time New Terms for Large Language Models with Annual Updates

Single Ground Truth Is Not Enough: Add Linguistic Variability to Aspect-based Sentiment Analysis Evaluation

Dictionary.com's Largest Update (Re)defines Thousands Of Words, Focusing On Identity